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Abstract 

This  paper  is  concerned  with  constructing,  for  each  expression  in  a  given 
program  text ,  a  symbolic  expression  whose  value  is  equal  to  the  value  of  the 
text  expression  for  all  executions  of  the  program.  A  cover  is  a  mapping  from  text 
expressions  to  such  symbolic  expressions.  Covers  can  be  used  for  constant  pro¬ 
pagation,  code  motion,  and  a  variety  of  other  program  optimizations.  Covers 
can  also  be  used  as  an  aid  in  symbolic  program  execution  and  for  finding  loop 

I 

invariants  for  program  verification.  We  describe  a  direct  (non-iterative) 
algorithm  for  computing  a  cover.  The  cover  computed  by  an  algorithm  is 
characterized  as  the  minimum  of  a  certain  fixed  point  equation,  and  is  in  general 
a  better  cover  than  might  be  computed  by  iteration  methods  (which  can  compute 
fixed  point  covers  which  are  not  minimal).  Our  algorithm  is  efficient  and  applic¬ 
able  to  all  flow  graphs.  A  variant  of  an  algorithm  is  implemented  by  [KK]  in 
an  optimizing  compiler  for  Pascal.  [Rl]  extends  our  algorithm  to  symbolic 
analysis  of  programs  with  records,  such  as  LISP  and  PASCAL  programs. 

_  ~ 
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1. 


INTRODUCTION 


Let  II  be  a  computer  program  to  which  we  wish  to  apply  various  optimizations. 
We  begin  by  formulating  a  global  flow  model  for  II  as  in  [H]  and  [MS]. 

1.1  The  Global  Flow  Model 

All  intraprogram  control  flow  is  reduced  to  a  digraph  indicating  which  blocks 

of  assignment  statements  may  be  reached  from  which  others  (but  giving  no  information 

about  the  conditions  under  which  such  branches  might  occur) .  The  control  flew  graph 

F  m  (N,A,s)  is  a  flow  graph  whose  nodes  are  called  blocks  (to  distinguish  it  from 

other  graphs  considered  in  our  paper)  and  rooted  at  the  start  distinguished  block 

s£N.  A  control  path  is  a  path  in  F.  Executions  of  the  program  correspond  to 

control  paths  beginning  at  the  start  blocks ,  although  not  every  such  path  in  this 

graph  need  correspond  to  a  possible  execution  of  the  program  II. 

The  only  statements  in  the  programming  language  retained  in  the  model  are 

assignment  statements.  An  assignment  statement  of  II  is  of  the  form  X  :=  $ . 

The  left-hand  side  of  the  assignment  is  a  program  variable  taken  from  the  set 

{x,Y,Z,...}.  The  right-hand  side  is  an  expression  S  built  from  program  variables 

and  fixed  sets  C  of  constant  symbols  and  0  of  function  symbols. 

Each  node  n€N  contains  a  block  of  assignment  statements.  These  blocks  do 

not  contain  conditional  or  branch  statements;  control  information  is  specified  by 

the  control  flow  graph  as  in  [C] .  A  program  variable  occurring  within  only  a 

single  block  n€N  is  local  to  n.  Let  Z  be  the  set  of  program  variables 

occurring  within  II  and  not  local  to  any  block.  For  each  program  variable  X6E 

and  block  n€N-{s}  we  introduce  as  in  [RT]  the  input  variable  Xn  to  denote 

s 

the  value  of  X  on  entry  to  block  n.  We  use  the  symbol  X  ,  considered  to  be  a 
constant  symbol,  to  denote  the  value  of  X  on  entry  to  the  program  II  at  the 


start  block  s. 


Let  EXP  be  the  set  of  expressions  built  from  input  variables,  C,  6. 

Thus,  <?£EXP  is  a  finite  expression  consisting  of  either  a  constant  symbol 
c£C,  axp  input  variable  Xn  representing  the  value  of  program  variable  Xn  on 
input  to  block  n,  or  a  k-adic  function  symbol  0£©  prefixed  to  a  k-tuple  of 
expressions  in  EXP.  Thus  €  is  a  term  in  a  first  order  language;  it  is  an 
expression  containing  no  predicates  and  built  from  function  symbols,  constant 
symbols,  and  variables  on  input  to  particular  blocks  of  assignment  statements. 

For  each  X€I  and  node  n£N  where  X  is  assigned  to,  let  the  output 
expression  «$?(X,n)  be  a  (canonically  chosen)  expression  in  EXP  for  the  vale  of  X  on  exit 
from  block  n  in  terms  of  constants  and  input  variables  at  block  n.  A  text 
expression  t  is  an  output  expression  or  a  subexpression  of  an  output  expression. 
Note  that  each  text  expression  t  is  a  substitution  instance  of  an  expression  on 
the  right  hand  side  of  an  assignment  statement  of  IT.  Let  TEXT  c  be  the  set  of 
text  expressions  for  program  II. 

For  example,  let  n  be  the  block  of  code: 

X  :=  X  -  1  ; 

Y  :=  Y  +  4  ; 

Z  :=  X  *  y  . 

Then  <?(Z,n)  =  (xn-l) * (yn+4)  (or  in  the  more  proper  prefix  notation, 

(*  (- xn  1) (+ Yn  4) ) )  is  the  text  expression  associated  with  the  string  of  text 
"X  *  Y"  at  the  last  assignment  statement  of  n. 

An  interpretation  for  the  program  II  is  an  ordered  pair  (U,I) .  The  universe 
U  contains  (among  otner  things)  a  distinct  value  1(c)  for  each  constant  symbol  c€C. 
For  each  k-adic  function  symbol  9£0,  there  is  a  unique  k-adic  operator  1(0) 
which  is  a  partial  mapping  from  k-tuples  in  into  U.  We  assume  I(c  )  /I(c_) 
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for  each  distinct  c^,  c^  6 C  (every  value  has  at  most  one  name).  For  example, 
a  program  is  in  the  arithmetic  domain  if  it  has  the  interpretation  (Z,I  ) 

Ct 

where  Z  is  the  set  of  integers  and  I  maps  symbols  to  the 

arithmetic  operations  addition,  subtraction,  multiplication,  and  integer 
division . 

An  expression  in  EXP  is  put  in  reduced  form  by  repeatedly  substituting 

for  each  subexpression  of  the  form  (6  c  .  ..c.),  that  constant  symbol  c  such 

±  & 

that  1(c)  = I (6 ) (1(c  ) , . . .  ,1  (c.) ) ,  until  no  further  substitutions  of  this  kind 
can  be  made.  We  assume  the  blocks  are  reduced  in  the  sense  of  Aho  and  Ullman 
[AU1] ,  so  each  text  expression  is  a  reduced  expression.  We  also  assume  that 
the  output  expressions  <f(X,n)  are  reduced  (and  thus  uniquely  determined). 

A  global  flow  system  p  is  a  quadruple  (F,£,U,I)  where  F  is  the 
control  flow  graph  of  H,  I  is  the  set  of  program  variables  and  (U,I)  is 
an  interpretation.  The  next  definitions  deal  with  a  fixed  global  flow  system 
p  =  (F,Z,U,I) . 


1.2  Covers 

The  utility  of  the  global  flow  model  is  that  many  program  analysis  and 
improvement  problems  may  be  formulated  as  combinatorial  problems  on  digraphs. 
The  fundamental  program  analysis  problem  of  interest  here  is  the  discovery, 
for  each  expression  t  in  the  text  of  the  program,  of  a  symbolic  expression 
&  for  the  value  of  t  which  holds  for  all  executions  of  the  program. 

Let  be  an  expression  in  EXP  and  let  p  be  a  control  path.  We  give 

a  recursive  definition  for  VALUE(<?,p),  the  expression  for  the  value  of 
in  the  context  of  a  program  execution  on  this  control  path  p.  VALUE  (^p) 
is  defined  formally  as  follows.: 

i)  if  p  =  (s)  then  VALUE(<f,p)  is  the  reduced  expression  derived 


from  S. 
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ii)  otherwise,  if  p=p'*(m,n)  then  VALUE (<?,p)  =  VALUE (<?'  ,p ' )  where 
S'  is  the  expression  obtained  from  S  by  substituting  the  output  expression 
#(X,m)  for  each  input  variable  xn,  and  putting  the  result  in  reduced  form. 

We  now  define  origin^),  where  <?£  EXP,  which  intuitively  is  the  earliest 
block  at  which  all  the  quantities  referred  to  in  S  are  defined.  Let 
N  (S)  =  {n  €  N  j  the  input  variable  X11  occurs  in  S) .  If  N(<£)  is  empty  then 
origin  (<f)  is  the  start  block  s  and  otherwise  or.gin(<£)  is  the  earliest 
(i.e. ,  closest  to  s)  block  in  N(S)  relative  to  the  dominator  ordering 
(see  Appendix  I) .  The  origin  need  not  exist  for  arbitrary  expressions  in  EXP, 
but  will  be  well-defined  in  all  the  relevant  cases  (i.e. ,  origin  exists  for  all 
text  expressions  and  their  covers) .  Note  that  if  a  text  expression  t  contains 
no  input  variables  then  origin (t)  =s,  and  otherwise  origin (t)  is  the  block 
in  N  where  that  assignment  statement  is  located. 

An  expression  S€  EXP  covers  a  text  expression  t  if  VALUE  (t,p)  =  VALUE  (<f,p) 
for  every  control  path  p  from  s  to  origin (t).  Hence,  if  S  covers  t 
then  S  correctly  represents  the  value  of  t  on  every  execution  of  program  IT. 
(See  Figure  2) . 

A  cover  is  a  mapping  ^  from  the  text  expressions  TEXT  to  expressions  in 
EXP  in  reduced  form  such  that  for  each  text  expression  t,  4<(t)  covers  t. 

Note  that  the  origin  of  any  cover  S  of  a  text  egression  t  is  always 
well  defined  since  the  elements  of  N (S)  will  form  a  chain  relative  to  the 
dominator  ordering. 

LEMMA  1.  If  S €  EXP  covers  text  expression  t  then  origin  (S)  dominates 
origin (t) . 

Proof  by  contradiction.  Suppose  origin (S)  does  not  dominate  origin (t).  Then 
S  must  contain  an  input  variable  Xn  such  that  n  is  not  a  dominator  of 
origin (t) .  Henc<  there  * -  an  n-avoiding  control  path  p  from  the  start  block 


s  to  origin (t)  such  that  VALUE (^?,p)  contains  X  but  VALUE (t,p)  does 
not,  so  VALUE ($,p)  /  VALUE (t ,p) ,  contradicting  the  assumption  that  <f  covers  t.  □ 

We  now  define  a  partial  ordering  of  covers.  For  each  pair  of  covers 
and  ^2#  iff  origin  (t) )  dominates  origin  0|>2  (t) )  for  all  text 

expressions  t. 

We  wish  to  compute  a  cover  minimal  with  respect  to  this  partial  ordering. 
Unfortunately,  Appendix  II  shows  this  is  an  undecidable  problem.  It  follows  that 
we  must  look  for  heuristic  methods  for  good,  but  not  minimal  covers.  Subsection 
1.4  defines  a  class  of  covers  which  are  fixed  points  of  an  iterative  process. 

The  minimal  fixed  point  cover  is  efficiently  computed  by  our  direct  algorithm 
given  in  Section  2.  The  next  subsection  describes  applications  of  covers  to 
program  optimization. 

1.3  Applications  of  Covers 

We  give  below  a  number  of  program  analysis  problems  and  optimizations  which 
reduce  to  the  problem  of  determining  covers  of  text  expressions.  These  examples 
indicate  that  computing  covers  is  of  fundamental  importance  to  program  analysis. 
[RL]  (which  is  a  preliminary  draft  of  this  paper)  and  the  recent  paper  of  [RT] 
were  the  first  to  consider  the  problem  of  computing  covers.  [KK]  have  made 
practical  application  of  our  work  in  the  implementation  of  an  optimizing 
computer  for  Pascal. 

a)  Constant  propagation  (or  folding)  is  the  substitution  of  the  appropriate 
constant  symbols  for  text  expressions  covered  by  constants  (see  [Ki] . 

b)  More  generally,  a  text  expression  t  located  at  block  n  is  redundant 
if  on  all  paths  from  the  start  block  to  n  another  text  expression  t'  yields 
a  computation  equivalent  to  that  of  t.  Thus  t  may  be  replaced  by  a  load 
operation  from  a  temporary  address  containing  the  result  of  some  such  equivalent 
previous  computation  (see  1C] ,  [CA] ,  [E],  [G] ,  {FKUJ ,  (UJ).  Thus  it  would  suffice 
that  each  such  t  has  the  same  cover  as  t. 
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c)  Code  motion  is  the  process  of  moving  code  as  far  as  possible  out  of 
cycles  in  the  control  flow  graph  (i.e.  ,  out  of  program  loops).  The  birth  point 
of  text  expression  t  is  the  earliest  block  n  in  the  control  flow  graph 
(relative  to  the  partial  ordering  of  blocks  by  domination  with  the  start  block 
first)  where  the  computation  of  t  is  defined.  Any  block  occurring  between 
(relative  to  this  domination  ordering)  n  and  the  original  location  of  t 

has  a  cover  for  t  in  terms  of  covers  for  the  variables  at  n.  This  best 
possible  birth  point  for  t  is  the  origin  of  the  minimal  covering  expression 
for  t.  Hence  code  motion  is  fundamentally  related  to  the  computation  of 
covers.  The  earliest  such  block  m,  with  the  further  property  that  the  computa¬ 
tion  of  t  can  induce  no  new  errors  at  that  block  m,  is  called  the  safe  point 
of  t;  the  computation  of  t  may  safely  be  moved  to  any  block  between  m  and 
loc(t).  The  text  expression  appropriate  at  the  chosen  block  may  not  be 
lexically  identical  to  t,  but  is  given  by  the  cover  of  t  in  terms  of  the 
variables  on  input  to  that  block.  Preliminary  work  on  simple  motions,  primarily 
emphasizing  safety,  but  not  considering  birth  points  is  given  in  ICA] ,  [G]  and  [Ej . 
[R2]  gives  a  complete  formulation  of  code  motions  considering  birth  points  and 
safepoints,  also  considering  the  movement  as  far  as  possible  out  of  cycles,  and 
give  an  efficient  algorithm  for  carrying  out  these  code  motion  optimizations. 

d)  A  cover  for  a  variable  in  a  program  loop  is  a  loop  invariant  (see 
[FU]  and  [W] .  The  discovery  of  loop  invariants  is  often  crucial  for  proving 
the  correctness  of  a  program;  see  for  example  [Ul] ,  [KM]  and  [HK] . 

e)  Symbolic  execution  of  a  program  as  described  in  [K2]  and  [CHT ] ,  and 
a  program  transformation  as  described  in  [L]  and  [SHKN]  generally  requires  a 
powerful  program  simplifier.  Domain  specific  simplifiers  such  as  [NO]  may 
require  the  solution  of  logical  decision  problems  which  require  much  time  and 
space.  The  covers  give  domain  independent  simplifications  of  program  text, 
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which  can  be  computed  efficiently.  A  practical  simplification  system  may  use 
a  combination  of  these  techniques. 

1.4  A  Compatible  Class  of  Covers 

In  Appendix  II  we  show  that  the  problem  of  computing  minimal  covers  over 
arithmetic  domains  is  unsolvable.  Here  we  consider  a  class  of  covers  that  can 
be  characterized  by  fixed  point  equations.  These  covers  can  be  computed  in¬ 
efficiently  by  an  iterative  algorithm  (later  in  this  paper  we  describe  how  to 
efficiently  compute  them  by  our  direct  algorithm) .  To  iteratively  construct 
this  class  of  covers,  we  would  first  take  a  pass  through  the  program  and 
construct  a  mapping  4q  from  text  expressions  to  EXP;  4Q  may  not  be  a  cover 
but  has  the  property  that  for  all  text  expressions  t, 

VALUE (4Q  (t)  ,p)  =  VALUE  (t  ,p) 

for  some  (rather  than  all)  control  paths  p  from  s  to  origin (t).  The 
algorithm  would  then  iteratively  compare  possible  covering  expressions  of  input 
variables  at  particular  blocks  to  the  corresponding  output  expressions  of 
preceding  blocks,  and  propagate  the  results  to  predecessor  blocks.  More 
precisely,  for  any  mapping  4  from  text  expressions  to  EXP,  let  T (4)  be  the 
mapping  4'  from  text  expressions  to  EXP  such  that  for  each  input  variable  X 

4*'  (Xn)  =  if  ■'  S-  4/(<f(X,m))  for  all  blocks  m  immediately  preceding 
n  in  the  control  flow  graph  F, 

=  Xn,  otherwise. 

and  4'  (t)  is  the  reduced  expression  derived  from  text  expression  t  after 

substituting  4'  (xn)  for  each  input  variable  X11  occurring  in  t.  This 

k 

iterative  algorithm  then  computes  4*  (4q)  for  k  =  l,2,...  until  a  fixed  point 
of  4'  is  obtained.  Note  that  4*  maps  covers  to  covers;  but  41  need  not  be 
monotonic,  i.e.,  for  some  cover  4  and  text  expression  t,  it  may  not  be  that 

4*  (4)  (t)  <4(t) . 


THEOREM  1.  If  ip  is  a  fixed  point  of  W  than  \ p  is  a  cover. 


Proof.  We  must  show  VALUE  ( ipC  t)  ,p)  =  VALUE(t,p)  for  all  text  expressions  t 

and  control  paths  p  from  s  to  the  block  where  t  is  located.  Let  p  be 

the  shortest  control  path  from  s  to  a  block  n  wj^ere  there  is  located  a 
text  expression  t  such  that 

VALUE  (ljj  (t)  ,p)  i  VALUE  (t,p)  . 

Thus  t  must  contain  an  input  variable  Xn  such  that 

VALUE  (!ji(Xn),p)  ¥  VALUE  (Xn,p)  . 

Clearly,  iMXn)  /X*1.  Let  m  be  the  next  to  last  block  in  p,  so  p  =  p'*(m,n). 

By  definition  of  'V ,  l^(Xn)  =  (<?(X,m))  .  Since  ip(Xn)  contains  no  input 

variables  at  n , 

VALUE  (4»  (Xn)  ,p)  =  VALUE  Op(Xn)  ,p') 

=  VALUE  (ijj  (^(X  ,m) )  ,p ')  ,  since  ^(Xn)  =  \p  («?(X,m) ) 

=  VALUE (<?(X,m)  ,p' )  by  the  induction  hypothesis, 

=  VALUE (Xn,p)  by  definition  of  VALUE.  .  □ 

In  Appendix  III,  we  show  that  V  has  a  unique  minimal  fixed  point 
(See  Figures  2  and  3  for  examples  of  the  minimal  fixed  point  cover.).  We  then 
show  how  to  efficiently  compute  ip*. 

The  overall  plan  of  Section  2  is  to  introduce  (in  Section  2.1)  a  special 
class  of  graphs  called  global  value  graphs  which  represent  the  flow  of  values 
(rather  than  control)  through  the  program  II.  We  define,  for  each  global 
value  graph  GVG,  a  set  r  ^  of  approximate  covers  associated  with  it. 
Appendix  III  shows  ^GVG  is  in  each  case  a  finite  semilattice  which  thus  has 
a  unique  minimal  element  ^GVG'  and  which  is  efficiently  calculated  by  the 
algorithm  presented  in  Sections  2.2-5.  As  we  show  in  Appendix  III,  for  a 
particular  choice  of  GVG,  r_„_  is  actually  the  minimal  fixed  point  of 

the  functional  ¥,  so  our  general  algorithm  does  indeed  compute  \Jj*. 


1.5  Comparison  with  Previous  Work 


In  order  to  compare  our  methods  with  others  we  must  fix  the  relevant  para¬ 
meters  of  the  program  and  control  flow  graph.  Let  n  and  a  be  the  cardinality 
of  the  node  and  edge  sets,  respectively,  of  the  control  flow  graph.  Let  0  be 
the  number  of  variables  occurring  within  more  than  one  block  of  the  program  (if 
we  built  into  the  programming  language  a  construct  for  the  declaration  of 
variables  local  to  a  block,  then  the  parameter  0  is  the  number  of  global 
variables) .  Let  £  be  the  length  of  the  program  text.  Our  careful  considera¬ 
tion  of  the  parameter  £ — avoiding,  for  example,  redundant  representations  of 
the  same  expression — is  one  of  the  novelties  of  our  approach.  Previous  authors 
have  analyzed  for  program  optimization  algorithms  primarily  from  the  point  of 
view  of  the  control  flow  graph  parameters  n  and  a. 

Kildall  [Ki]  presents  an  iterative  algorithm  for  computing  approximate 
solutions  to  various  expression  optimization  problems.  The  discovery  of  constant 
text  expressions  by  Kildall's  algorithm  may  require  fi(o(£+a))  elementary  steps 
and  ft(0a)  operations  on  bit  vectors  of  length  0 (o£) .  (£2(f(x))  is  a  function 
bounded  from  belcn*)  by  k-f(x)  for  some  k.  See  Knuth  [Kn2].)  Kam  and  Ullman 
[KU2]  show  that  the  Kildall  algorithm  discovers  only  a  restricted  class  of  text 
expressions  covered  by  constant  symbols.  (See  Figure  4.)  Neither  of  these 
authors  considered  the  more  general  problem  of  computing  covers  of  text 
expressions. 

As  described  in  Section  1.4  an  iterative  algorithm  may  also  be  used  to 
compute  a  certain  class  of  covers,  which  we  have  characterized  as  fixed  points 
of  an  update  functional  T  mapping  approximate  covers  to  improved  covers. 

Fong,  Kam,  and  Ullman  [FKU]  give  another  algorithm,  using  a  direct  (noniterative) 
method  which  could  be  adapted  to  give  covers,  though  these  covers  would  be 


weaker  than  those  fixed  point  covers  and  their  algorithms  are  restricted  to 


£(Z,n)  =  X  •+  Y  is  a  text  expression  which  is  covered 
by  the  constant  5  but  is  not  discovered  by  Kildall's 
algorithm. 


reducible  flow  graphs.  We  will  assume  these  algorithms  are  executed  on  a 
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unit  cost  random  access  machine.  The  iterative  algorithm  requires  U(SLn  ) 
elementary  steps  and  Fong,  Kam,  and  Ullman's  algorithm  requires  fi(fl,a  log(a)) 
elementary  steps.  One  source  of  inefficiency  of  both  of  these  algorithms  is 
in  the  representation  of  the  covers.  Directed  acyclic  graphs  (dags)  are  used 
to  represent  expressions ,  but  separate  dags  are  needed  at  each  node  of  the 
flow  graph.  Since  a  dag  representing  a  cover  may  be  of  size  SH&)  the  total 
space  cost  may  be  fi(Jln).  Various  operations  on  these  dags,  which  are  con¬ 
sidered  to  be  "extended"  steps  by  Fong,  Kam,  and  Ullman  [FKU]  ,  cost  ?}(£.) 
elementary  steps  and  cannot  be  implemented  by  any  fixed  number  of  bit  vector 
operations.  In  general,  any  similar  algorithm  for  computing  a  cover  which 
attempts  to  pool  information  separately  at  each  node  of  the  flow  graph  will 
have  time  cost  of  fi(£a) ,  since  the  pools  on  every  pair  of  adjacent  nodes  must 
be  compared.  Since  £>n,  such  a  time  cost  may  be  unacceptable  for  practical 
applications. 

Another  problem  with  these  previous  methods  is  they  do  not  necessarily 
compute  good  covers.  The  iterative  algorithm  only  computes  a  fixed  point  of 
T,  but  not  necessarily  its  minimal  fixed  point  (again,  see  Figure  3).  Our 
algorithm  always  gives  the  minimal  fixed  point.  At  any  rate,  this  paper  and 
subsequent  papers  [R2] ,  [TR]  were  the  first  in  the  literature  directly  concerned 
with  computing  covers. 

The  global  value  graphs  used  in  this  paper  contain  dags  of  program  blocks 
as  well  as  the  use-def  edges  of  [Sc]  to  represent  the  global  flow  of  values 
through  the  program.  The  use  of  a  global  value  graph  leads  to  our  efficient 
direct  algorithm  for  computing  covers  which  works  for  all  flow  graphs.  The 
method  derives  its  efficiency  by  representing  the  covers  with  a  single  dag, 
rather  than  a  separate  dag  at  each  node.  The  global  value  graph  GVGq  is  of 
size  0(0a  +  Jl),  although  the  results  of  [RT]  may  be  used  to  build  a  global 


2- 


value  graph  which  in  many  cases  is  of  size  0(a  +  2)  (see  Section  3).  In 

elementary  operations,  the  time  cost  of  our  algorithm  for  the  discovery  of 

constants  is  linear  in  the  size  of  GVG,  and  our  algorithm  for  finding  the 

cover  which  is  the  minimal  fixed  point  of  4*  requires  time  almost  linear  in 

the  size  of  the  GVG.  Thus  our  algorithm  for  symbolic  evaluation  takes  worst 

case  time  almost  linear  in  0a +  2.  (a +  2  in  many  cases),  as  compared  to  the 

2 

iterative  algorithm  which  may  require  ft(2n  )  steps.  Recently,  Reif  and 
Tar j an  [rt]  give  an  algorithm  which  computes  simple  covers  (weaker  than  minimal 
fixed  points  of  W)  in  time  almost  linear  in  2  +  n  +  a.  This  algorithm  also 
uses  a  single  dag  for  representing  the  simple  cover  and  works  for  all  flow 
graphs. 


i 


i ; 
► 
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2.  AN  EFFICIENT  ALGORITHM  FOR  COMPUTING  A  COVER  * 

2. 1  Dags  and  Global  Value  Graphs 

A  labeled  dag  D=  (V,E,L)  is  a  labeled,  acyclic,  oriented  digraph  with  a 
node  set  V,  an  edge  list  E  giving  the  order  of  edges  departing  from  nodes, 
and  a  labeling  L  of  the  nodes  in  V.  A  rooted  labeled  dag  (D,r)  represents 
an  expression  &  if  8  is  the  parenthesized  listing  of  the  labels  of  the  sub¬ 
graph  of  D  rooted  at  r  in  topological  order  from  r  to  the  leaves  and  from 
left  to  right.  Where  D  is  fixed,  we  simply  say  r  represents  S’  if  (D,r) 
so  represents  <?.  (See  Figure  5 . ) 

The  dag  D  is  minimal  if  each  node  r€v  represents  a  distinct  expression 
Any  expression  or  set  of  expressions  may  be  represented,  with  no  redundancy,  by 
a  minimal  dag  D(n)  to  represent  efficiently  the  set  of  text  expressions 
located  at  block  n.  We  have  assumed  that  each  block  is  reduced,  so  each  node 
in  D(n)  corresponds  to  a  unique  text  expression.  IAU1]  describe  the  use  of 
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dags  for  representing  computations  within  blocks.  [Ki]  and  [FKU]  have  applied 
dags  to  various  global  flow  problems. 

We  now  come  to  the  central  definition.  To  model  the  flow  of  values 
through  a  program  II,  we  introduce  a  class  of  labeled  digraphs  called  global 
value  graphs.  These  are  derived  by  combining  the  dags  of  all  the  blocks  in  N 
and  adding  a  set  of  edges  called  use-def  edges  (which  pair  nodes  labeled  with 
input  variables  to  other  nodes).  More  precisely,  a  global  value  graph  is  a 
possibly  cyclic,  labeled,  oriented  digraph  GVG  =  (V,E,L)  such  that: 

(1)  the  node  set  V  is  the  union  of  the  node  sets  of  the  dags  of  N, 

(2)  E  is  an  edge  list  containing  (a)  the  edge  list  of  each  D(n)  and 

2 

(b)  a  set  of  pairs  in  V  (use-def  edges)  such  that  (i)  the  first  node  of  each 
use-def  edge  is  labeled  with  an  input  variable  and  (ii)  for  each  v€V 
labeled  with  an  input  variable  xn,  and  control  path  p  from  s  to  n,  there 
is  some  use-def  edge  depai  ing  from  v  and  entering  a  node  located  at  a  block 
in  p  and  distinct  from  n. 

(3)  L  is  a  labeling  of  V  identical  to  the  vertex  labeling  of  each  D(n) . 

Note  that  for  each  v6V,  if  v  represents  a  constant  symbol  c  then  v 

is  labeled  with  c  and  has  no  departing  edges;  if  v  represents  a  function 
application  (6  t^...t^)  then  v  is  labeled  with  the  k-adic  function  symbol 
0  and  u  ,...,u  are  the  immediate  successors  of  v  in  GVG  representing 
t^, — ,tk»  respectively;  if  v  represents  an  input  variable  Xn  then  v  is 
labeled  with  X11  and  all  the  edges  departing  from  v  are  use-def  edges.  For 
each  node  v€V,  let  loc(v)  be  the  block  in  N  where  the  text  expression 
which  v  represents  is  located. 

We  assume  here,  as  in  Section  1,  that  the  set  of  text  expressions  of  each 
block  n€N  includes  all  input  variables  at  n.  This  may  require  adding  dummy 
assignments  of  the  form  X  :=  X  to  satisfy  this  assumption.  Let  r_„_ 
set  of  mappings  ip  from  V  to  EXP  such  that  for  all  v£V, 


be  the 
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(1) 

if 

L(v) 

is  a  constant  symbol 

c 

then  <l>  (v)  =  c,  or 

(2) 

if 

L  (v) 

is  a  function  symbol 

e 

and  v  has  immediate  successors 

U1 . \ 

(in 

.  this 

order)  then  ip(v)  is 

the 

reduced  expression  derived  from 

(6  ^(U^) - if^U^)  )  >  or 

(3)  if  L(v)  is  an  input  variable  then  either  (a)  i p(v)  =L(v)  or 

(b)  ip(v)  =^(u)  for  all  use-def  edges  (v,u)  departing  from  v. 

Note  that  for  any  node  v  satisfying  (2),  ip(v)  is  determined  from  the 

input  variables  occurring  in  the  text  expression  which  v  represents.  Hence 

any  \l>  €  r  is  uniquely  specified  by  the  set  of  input  variables  satisfying 
GVG 

case  (3a),  so  T  has  at  most  2^^  elements. 

GVG 

In  Appendix  III  we  show  that  r  is  a  finite  semilattice,  and  hence 

GVG 

has  a  minimal  element. 

Let  GVGq  be  the  standard  global  value  graph  containing  only  the  use-def 
edges  {(v,u)|v  represents  input  variable  xn  and  u  represents  the  output 
expression  <?P(x,m)  for  each  program  variable  X€£  and  edge  (m,n)  €a  of  the 
control  flow  graph  f}.  (See  Figure  6.)  Note  that  while  there  are  in  the  worst 
case  £n  possible  use-def  edges  GVG*  contains  at  most  &o  use-def  edges. 

Let  ’4'*  be  the  minimal  fixed  point  of  Y,  the  functional  defined  in  Section  1.4. 
Appendix  III  shows  identical  to  be  the  minimal  element  of  T  applied  to  the 

standard  global  value  graph  GVGQ.  (Also,  in  Section  3  we  define  a  global  value 
graph  GVG^  with  the  same  property,  but  which  often  is  of  size  linear  in  £+a.) 


.  r 


i 
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2.2  Detection  of  Constants 

Let  GVG  =  (V,E,L)  be  an  arbitrary  global  value  graph.  Let  V  * 

be  a  minimal  element  of  T  .  We  wish  to  compute  a  new  labeling  L'  of  V 

GVG 

such  that  for  each  v€V,  if  ip (v)  is  a  constant  sign  then  L'  (v)  =c  and 

otherwise  L' (v)  =L(v).  Nodes  thus  relabeled  with  constants  may  be  discovered  * 

by  propagating  possible  constants  through  GVG,  starting  from  nodes  originally 

» 


labeled  with  constants,  and  then  testing  for  conflicts.  This  leads  to  an 


algorithm  for  constant  propagation  with  time  cost  linear  in  the  size  of  the  GVG. 

Recall  that  a  spanning  tree  of  the  control  flow  graph  F=  (N,A,s)  is  a 
tree  rooted  at  s,  with  node  set  N,  and  edge  set  contained  in  A.  A  pre¬ 
ordering  of  a  tree  orders  fathers  before  sons.  Let  <  be  a  preordering  of 
some  spanning  tree  of  F.  For  each  v£v,  let  loc  (v)  be  the  node  in  N  at  which  the 
text  expression  associated  with  V  is  located.  We  construct  an  acyclic  subgraph  of 
GVG  by  deleting  the  set  of  use-def  edges  E =  { (v,u) | loc(v)  <loc(v)}.  Observe  that 
(V,E-E)  is  acyclic.  We  shall  propagate  constants  tn  a  topological  order  (see 
Appendix  I  for  definition)  of  (V,E-E) ,  from  leaves  to  roots.  (See  Figure  7). 

Our  algorithm  for  computing  the  new  labeling  L'  is  given  below. 

ALGORITHM  A 

INPUT  global  value  graphs  GVG =  (V,E,L)  and  control  flow  graph  F. 

OUTPUT  L' . 


begin 

declare  L'  to  be  an  array  of  length  |v|; 

Let  <  be  a  preordering  of  a  spanning  tree  of  F; 

Q  :=  E  :=  the  enpty  set  {}; 

for  all  use-def  edges  (v,u)  €E  such  that  loc(v)  <  loc(u) 
do  add  (v,u)  to  E; 
comment  propagate  constants; 

LO; for  each  v€V  in  topological  order  of  (V,E-E) 
from  leaves  to  roots  do 

if  L(v)  is  a  constant  sign  c  then  Lis  L' (v)  :=  c? 
else  if  L(v)  is  a  k-adic  function  symbol  0, 

u^,...,^  are  the  immediate  successors  of  v  in 
GVG,  and  (0  L' (u^) . . .L' (uk) )  reduces  to  a 
constant  c  then  L2:  L' (v)  :=  c; 
else  if  L(v)  is  an  input  variable  and  there 
is  a  constant  c  such  that  L' (u)  =  c 
for  all  use-def  edges  (v,u)  departing  from  v 
then  L3:  L' (v)  :=  c; 

else  begin  add  v  to  Q;L' (v)  ;=  L(v)  end; 
end; 

comment  test  for  conflicts; 

L4 : for  each  v€V  labeled  with  an  input  variable  do 

if  v  has  a  departing  use-def  edge  (v,u)  such  that 
L' (v)  /L'(u)  then  add  v  to  Q; 
till  Q *  the  empty  set  {}  do 


begin 

delete  some  node  v  from  Q; 
if  L' (v)  is  a  constant  use-def  then 
L5 :  begin 

L'  (v)  :=  L(v)  ; 

add  all  immediate  predecessors  of  v  in  GVG  to  Q; 
end; 

end; 


LEMMA  2.1.  If  ^(v)  is  a  constant  then  L' (v)  is  set  to  ip(v)  at  LI,  L2, 


or  L3. 


Proof,  by  induction  on  the  topological  order  of  (V,E-E) . 

Basis  Step.  Suppose  v  is  a  leaf  of  (V,E-E) .  Then  L(v)  is  a  constant 
sign  and  so  L' (v)  is  set  to  L(v)  =l^(v)  at  LI. 

Induction  Step.  Suppose  v  is  in  the  interior  of  (V,E-E)  and  L' (u)  has 
been  set  to  ^  <u)  for  all  u  occurring  before  v  in  the  topological  order 
where  ip (u)  is  a  constant.  Then  v  represents  either  a  function  application 
or  an  input  variable. 

CASE  1.  Suppose  L(v)  is  a  k-adic  function  sign  0  and  u^,...,u^  are  the 

immediate  successors  of  v  in  (V,E-E) .  If  ^(v)  is  a  constant  c  then  by 

definition  of  T,  ^ (u^)  , . . .  ,4* (u^)  are  constants  C2/**''C]C'  respectively 

and  (0  C....C  )  reduces  to  c.  By  the  induction  hypothesis  L' (u  ) , . . . ,L' (u  ) 
i  K  1  K 

have  been  previously  set  to  ci'"’,cjc  an<^  so  L'  ^  set  to  =c  at  L2 

CASE  2.  Otherwise,  L(v)  is  a;.  input  variable  X°.  If  i^(v)  is  a  constant 
symbol  c  then  i|/(v)  ¥  X11  so  by  definition  of  r  ,  c  =  \J)(u)  for  all  use-def 
edges  (v,u)  departing  from  v.  By  the  induction  hypothesis,  L' (u)  has  been 
set  to  c  =  iMu)  for  each  use-def  edge  (v,u)  €E-E.  Now  we  must  show  v  has 


some  departing  value  edge  (v,u)  €e-E.  Let  T  be  the  spanning  tree  of  F 
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with  preorder  <•  Consider  the  path  p  in  T  from  the  start  block  s  to 
n.  By  definition  of  GVG,  there  is  a  use-def  edge  (v,u)  such  that  loc(u) 
is  distinct  from  n  and  is  contained  in  p.  Hence  (v,u)  € E-E  and  L(v) 
is  set  to  c  at  L3.  o 

Let  Q  be  the  value  of  Q  just  after  L4.  Then  v£V  is  eventually 
added  to  Q  and  L' (v)  reset  to  L(v)  iff  some  element  of  Q  is  reachable 
in  GVG  from  v.  If  v€V  is  labeled  by  L'  with  a  constant  at  L4,  then 
we  show 

lemma  2.2.  ip  (v)  is  not  a  constant  iff  some  element  of  Q  is  reachable  in 
GVG  from  v. 

Proof.  IF.  Suppose  ip  (v)  is  not  a  constant,  but  no  element  of  Q  is 
reachable  from  v.  Then  let  if  be  the  mapping  from  V  to  EXP  such  that 
for  each  uCV,  if  (u)  is  the  reduced  expression  derived  from  ip(u)  after 
substituting  ^(w)  for  each  input  variable  represented  by  a  node  w  (i.e.  , 
w  is  the  unique  node  labeled  with  that  input  variable)  from  which  an  e  lev.  rtf 
of  Q  is  reachable.  Then  if  €  T  but  origin  (if  (v)  )=  s  ^  origin  (v) )  , 

vjVG 

contradicting  the  assumption  that  ip  is  the  minimal  element  of  r  . 

ONLY  IF.  Suppose  some  element  of  Q  is  reachable  from  v  in  GVG. 

Clearly  if  v€Q,  then  4>(v)  is  not  a  constant.  Assume  for  some  k>0,  if 

there  is  a  path  of  length  less  than  k  in  GVG  from  some  u€V  to  an  element 

of  Q,  then  if(u)  is  not  a  constant  sign.  Suppose  there  is  a  path 

(v  =  w  ,w  , —  ,w  )  of  length  k  from  v  to  w  €Q.  If  k  =  1,  then  w  €Q, 

U  X  K  K  * 

and  otherwise  if  k>l,  then  (w  , —  ,w^)  is  a  path  of  length  k-1.  By  the 
induction  hypothesis,  ty(w^)  is  not  a  constant.  But  (v,w^)  €E  and  by  the 
definition  of  I*  ,  ^(v)  is  not  a  constant.  □ 
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theorem  2.1.  Algorithm  A  is  correct  and  has  time  cost  linear  in  the  size  of 
the  GVG. 

Proof .  The  correctness  of  Algorithm  A  follows  directly  from  Lemmas  2.1  and 

2.2. 

In  addition  we  must  show  Algorithm  A  has  time  cost  linear  in  |v|  +  |e|. 
The  initialization  costs  time  linear  in  j V | .  The  preordering  <  may  be 
computed  in  time  linear  in  |n|  +  |a|  by  the  depth  first  search  algorithm  of 
[Tl] .  The  time  to  process  each  v£V  at  steps  LO  and  L4  is 
0(1 + outdegree (v) ) .  Step  L5  can  be  reached  at  most  |vj  times  and  the  time 
cost  to  process  each  node  v  at  step  L5  is  0(1 + indegree (v) ) .  Thus,  the 
total  time  cost  is  linear  in  jvj  +  |e|.  a 

In  some  cases,  we  may  improve  the  power  of  Algorithm  A  for  particular 
interpretations  by  applying  algebraic  identities  to  reduce  expressions  in 
EXP  more  often  to  constant  symbols.  For  example,  in  the  arithmetic  domain 
we  can  use  the  fact  that  0  is  the  identity  element  under  integer  multi¬ 
plication  to  modify  Algorithm  A  so  that  if  node  v  is  labeled  by  L  with 
the  multiplication  symbol  and  a  successor  of  v  in  GVG  is  covered  by  0, 
then  at  step  L3  we  may  set  L' (v)  to  the  constant  0. 

From  the  new  labeling  L'  and  GVG =  (V,E,L) ,  we  construct  a  reduced 
global  value  graph  GVG'  =  (V,E',L‘)  with  labeling  L'  and  with  edge  set 
E'  derived  from  E  by  deleting  all  edges  departing  from  nodes  labeled  by 
L'  with  constant  symbols.  This  corresponds  to  substituting  constant 
symbols  for  constant  text  expressions  in  the  program  II.  We  assume  through¬ 


out  the  next  three  sections  that  GVG  is  so  reduced. 


Global  Value  Graph 


Figure  7. 


A  simple  example  of  constant  propagation  through  the  global 
value  graph. 


2.3  A  Partial  Characterization  of  \p,  the  Minimal  Element  of  f 


GVG 

Let  GVG =  (V ,E ,L)  be  a  reduced  global  value  graph  as  constructed  by 

Algorithm  A  of  the  last  section.  Let  ^  be  the  minimal  element  of  T  . 

GVG 

A 

Let  V  be  the  set  of  nodes  in  V  nodes  labeled  with  constant  and  function 

symbols.  Observe  that  T  characterized  exactly  the  values  of  any  such 

GVG 

A  A 

^  over  nodes  in  V  in  terms  of  the  values  of  ip  over  the  nodes  in  V-V, 
i.e.,  in  terms  of  the  nodes  labeled  with  input  variables.  The  following 

A  A 

Theorem  characterizes  over  V-V  in  terms  of  ip  over  V. 

We  require  first  a  few  additional  definitions.  A  use-def  path  is  a 
path  p  in  GVG  traversing  only  nodes  linked  by  use-def  edges.  A  use-def 
path  is  maximal  if  the  last  node  of  p  has  no  departing  value  edges.  For 
any  node  v£V  labeled  with  an  input  variable,  let  H(v)  be  the  set  of 
nodes  in  V  lying  at  the  end  of  a  maximal  use-def  path  from  v.  Note  that 
H (v)  is  a  subset  of  V.  Call  two  paths  disjoint  if  they  have  only  their 
initial  node  in  common. 

theorem  2.2.  If  v  is  labeled  with  an  input  variable ,  then  either 

(a)  i^(v)  =4>(u)  for  all  u£H(v),  or 

(b)  tjj(v)  =  L  (u)  j  where  u  is  the  unique  node  such  that 

(i)  u  lies  on  all  maximal  use-def  paths  from  v  but 
(ii)  there  are  disjoint  maximal  use-def  paths  from  v  to  nodes 
u^,u2£H(v)  such  that  lMu^)  /^(u2>.  (See  Figure  8). 

Proof.  Suppose  <Mv)  is  not  an  input  variable,  so  there  exists  a  maximal 
use-def  path  p  from  v  to  some  u^€H(v)  such  that  ^(v)  =iHu^).  Assume 
there  exists  another  maximal  use-def  path  p'  from  v  to  some  u26H(v) 
such  that  ip(v)  /^(u2).  Let  z  be  the  first  element  of  p'  such  that 
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\p( z)  and  let  z'  be  the  immediate  predecessor  of  z  in  p' ,  so 

^(z‘)  =i^(v).  Then  by  definition  of  r_TJ_,  ip  (v)  =  ip  (z 1 )  =  L(z ' )  is  an  input 
variable  contradiction. 

Suppose  i|»(v)  is  an  input  variable,  so  >^(v)  =L(u)  for  some  u€V. 

For  any  maximal  use-def  path  p  from  v,  let  z  be  the  first  element  of 

p  such  that  \p(z)  ^L(u)  and  let  z'  be  the  immediate  predecessor  of  z 

in  p.  Then  by  definition  of  F  ,  l|>(z')  =  L(z')  =  L(u)  so  z'  =u  is 

GVG 

contained  on  p.  Now  suppose  that  there  is  a  node  w£V  distinct  from  u 
and  contained  on  all  maximal  use-def  paths  from  u. 

Consider  any  control  path  q  from  the  start  block  s  to  block  loc(u). 
By  Lemma  2.3,  we  can  construct  a  maximal  use-def  path  (u = w  , . . . ,w  )  such 

X  K 

that  loc (w^) , . . . , loc (w^)  are  distinct  blocks  in  q.  Hence,  loc(w) 
properly  dominates  loc(u) . 

Let  be  the  mapping  from  V  to  EXP  such  that  for  all  v'  €V, 

ip'(v')  is  derived  from  \p(v')  by  substituting  L(w)  for  each  input 

variable  labeling  a  node  from  which  all  maximal  use-def  paths  contain  w. 

Then  \p'€r  .  But  origin  (ip’  (v)  )=  loc  (w)  properly  dominates  loc(u)  = 

GVG 

origin  (\p  (v) )  ,  contradicting  our  assumption  that  \p  is  minimal  over  d 

Theorem  2.2  suggests  a  procedure  for  calculating  ip,  but  there  is  an 
implicit  circularity  since  the  calculation  (using  Theorem  2.2)  of  ty(v)  for 
v£V-V  requires  the  determination  (using  the  definition  of  of  ^(u) 

for  u  6  H  (v)  ;  but  since  u€V,  the  calculation  of  i^(u)  may  require  the 

A 

determination  of  <p(w)  for  some  other  w£v-V.  The  way  out  is  by  the  rank 
decomposition  discussed  in  the  next  section.  There  will  remain  the  problem 
of  finding  disjoint  paths,  which  we  consider  in  Section  2.5.  This  allows  us 
to  apply  Theorem  2.2  without  circularity. 
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2.4  Rank  Decomposition  of  a  Reduced  GVG 

This  section  describes  a  decomposition  of  the  nodes  of  a  reduced 

GVG =  (V ,E ,L)  into  sets  for  which  we  may  completely  characterize  the  minimal 

This  leads  to  an  algorithm  for  the  construction  of  \p. 

Fong,  Kam,  and  Ullman  [FKU]  describe  the  rank  decomposition  of  a  dag; 

this  provides  a  topological  ordering  of  a  dag  from  leaves  to  roots  over 

which  the  dag  may  be  efficiently  reduced.  Here  we  generalize  the  rank 

decomposition  to  a  possibly  cyclic  GVG;  this  provides  us  a  method  of 

partitioning  V  into  sets  of  text  expressions  over  which  \p  may  have  the 

same  value;  it  also  allows  us  to  apply  Theorem  2.2  without  circularity, 

characterizing  completely  the  minimal  ty£r_,,_.  In  Section  2.5  we  apply 

GVG 

the  rank  decomposition  to  implement  our  direct  method  for  symbolic 
evaluation. 

The  rank  of  a  node  v€V  is  defined: 

rank(v)  =  0  if  v  is  labeled  with  a  constant  symbol 
=  1  +  MAX{rank  (u)  |  (v,u)  £e}  for  v  labeled 
with  a  function  symbol 

=  MIN{rank (u) | u £ H (v) }  for  v  labeled  with  an 
input  variable. 

(See  Figure  9.) 

Observe  that  in  the  very  simple  case  where  II  contains  only  a  single 
block  of  code,  at  the  start  block  s,  then  GVG  consists  of  the  dag  D(s). 
Hence  the  rank  of  a  node  v  €  V  is  the  length  of  a  maximal  path  from  v  to 
a  leaf  of  the  dag  D(s);  inducing  a  topological  ordering  of  the  dag  D(s) 
from  leaves  to  roots. 

LEMMA  2.3.  'Mv)“<Mv')  implies  rank  (v)  =  rank  (v* ) . 


Proof.  We  proceed  by  induction  on  rank  of  v. 

Basis  Step.  Suppose  v  is  of  rank  0,  so  i|>(v)  ='Mv')  is  a  constant 
symbol  c.  But  since  GVG  is  reduced,  L(v')  =c  and  v'  is  also  of  rank  0. 

Inductive  Step.  Suppose  for  some  r>0,  rank(w)  =rank(w’)  for  all  w,w'  £V 
such  that  rank(w)  <r  and  =  ty(w').  Consider  some  v,v'  €v  such  that 

rank  (v)  =  r. 

CASE  a.  Suppose  tp(v)  =^(V)  is  the  function  application  (0<?  ...<£  ) .  Then 

by  Theorem  2.2,  i(/(v)  =^(u)  for  all  u£H(v),  and  similarly,  iHv')  =  iMu') 

for  all  u'  €H(v*).  Fix  some  u£H(v)  and  u'  £H(u').  By  definition  of 

r_„_,  L(u)  =  L(u')  -6  and  if  w,  ,...,w.  are  the  immediate  successors  of  u 

1  k 

and  w'  ,  ...,w'  are  the  immediate  successors  of  u*  ,  then  <?.  =4>(w.)  =^(w!) 
XX.  11  1 

for  i  =  l,...,k.  By  the  induction  hypothesis,  rank (w^)  =  rank (w|)  for 

i  =  1, . . . ,k.  Hence, 

rank(v)  =  rank(u) 

=  1 + MAX{rank (w  ) , . . . ,rank(w  ) } 

X  X 

=  1 + MAx{rank (w' ) _ _ ,rank(w')} 

X  X 

=  rank(u') 

=  rank (v' ) . 

CASE  b.  Suppose  tf»(v)  =t|/(v')  is  an  input  variable.  By  Theorem  2.2, 

<Mv)  *^(v')  =L(u)  for  sane  u£V  contained  on  all  value  paths  from  v  and 
v'.  Hence,  rank (v)  = rank (v' )= rank (u) .  o 

To  compute  the  rank  of  all  nodes  in  GVG  we  use  a  modified  version  of 
the  depth  first  search  developed  by  Tarjan  [Tl] .  Because  the  search  proceeds 
backwards,  we  require  reverse  adjacency  lists  to  store  edges  in  E.  Note  that 
the  RANK (v)  is  used  in  two  different  ways;  first  to  store  the  number  of 
successors  of  node  v  which  have  not  been  visited,  and  later  RANK (v)  is 


set  to  rank (v) . 
initially  compute 

A  A 


V  -V  and  V  , 
r  r  r+1 


Let 


A 


A  A 

Vr,  Vr  be  the  nodes  in  V,  V  of  rank  r.  We 

and  on  the  r'-th  execution  of  the  main  loop  we  compute 


ALGORITHM  B 


INPUT  GVG  =  (V,E,L) 
OUTPUT  RANK 


begin 

declare  RANK  :*  an  array  of  integers  of  length  |v| ; 
for  all  v  €  V  do 

RANK (v)  :=  -  outdegree (v) ; 
r  :=  0; 

Q'  :={v|l(v)  is  a  constant  symbol}; 
until  Q'  =  the  empty  set  {}  do 
begin 

Q  :*Q‘;  Q'  :=the  empty  set  {}; 
comment  Q  =  Vr ; 

L:  until  Q  = the  empty  set  {}  do 
begin 

delete  v  from  Q; 

for  each  immediate  predecessor  u  of  v  do 
if  L (v)  is  a  function  symbol  then 
if  RANK (u)  = -1  then 
begin 

comment  u?Vr+j; 

RANK (u)  :=  r+1; 
add  u  to  Q ' 
end 

else  RANK (u)  :=  RANK <u)  +  1; 
else  if  RANK {u)  <  0  then 
begin 

comment  u  €  Vr  -  Vr ; 

RANK (u)  ;=  r ; 
add  u  to  Q 
end; 

end; 

r  :=  r  +  1; 
end; 
end. 


THEOREM  2.3.  Algorithm  B  is  correct  and  has  time  cost  linear  in  |v|  +  |e|. 


Proof  by  induction  on  r. 


Basis  Step.  Initially,  RANK(v)  is  set  to  -(outdegree  of  v)  for  each  v£v. 
So  if  L(v)  is  labeled  with  a  constant  symbol  then  RANK (v)  is  set  to  0. 
Also,  Q  is  initially  set  to  VQ  just  before  label  L. 

Inductive  Step.  Suppose  for  some  r>0,  we  have  on  entering  the  inner  loop 
at  label  L  on  the  r'-th  time: 

(1)  Q  =  Vr, 

(2)  For  each  v£V,  RANK (v)  =rank(v)  if  rank  (v)  <r  or  v€  V^.,  and 
RANK (v)  =- (number  of  successors  of  v  with  rank  >  r)  if 

A 

rank  (v)  >  r  or  v  6  V  -V  . 

r  r 

In  the  inner  loop  we  add  to  p  exactly  the  nodes  V^-V^ = {v £  V-V| some  element 

A  .  A 

of  is  reachable  by  a  use-def  path  from  v).  For  each  such  v €  V  -V 

added  to  p,  RANK(v)  is  set  to  r.  Also,  for  each  v£V,  if  rank(v)  >  r+1 
then  RANK (v)  is  incremented  by  1  for  each  immediate  successor  of  v  of 
rank  r;  if  rank(v)  =  r+1  then  all  immediate  successors  of  v  are  of  ranker 
so  RANK (v)  is  set  to  r+1  and  v  is  added  to  p.  Thus,  (1)  and  (2)  are 
satisfied  entering  the  loop  on  the  r+1  time. 

Now  we  show  that  Algorithm  B  may  be  implemented  in  linear  time.  For 
each  node  v£v  we  keep  a  list  (the  reverse  adjacency  list),  giving  all 
predecessors  of  v.  To  process  any  v£p'  requires  time  0  (1  +  indegree  (v) )  . 
Since  each  node  is  added  to  Q'  exactly  once,  the  total  time  cost  is  linear 
in  |v|  +  |  E 1  .  D 

A  A 

This  suffices  for  the  construction  of  p(v)  for  v £ ,  Vq“Vo' 

A  A  ... 

may  be  determined  by  alternately  applying  the  definition  of 
and  Theorem  2.2. 

GVG 

Using  this  method  could  be  inefficient,  since  Theorem  2.2  could  be 
expensive  to  apply  and  the  representations  of  the  values  could  grow  rapidly 
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in  size.  The  first  problem  is  solved  by  reducing  it  to  the  problems  of  P-graph 
completion  and  decomposition  as  described  in  the  next  subsection  2.5.  The 
second  problem  is  solved  by  constructing  a  special  labeled  dag;  the  construction 
of  this  dag  and  the  final  algorithm  are  given  in  Section  2.6. 

2.5  P-Graph  Completion  and  Decomposition 

Let,  GVG =  (V,E,L)  be  a  reduced  global  value  graph.  This  section  presents 

an  efficient  method  for  applying  Theorem  2.2  to  nodes  in  vr  “  vr  (i.e. ,  nodes 

of  rank  r  labeled  with  input  variables).  Now  to  compute  ifj* ,  the  minimal 

element  of  F  ,  it  suffices  to  find  the  partitioning  of  V  such  that 
CjVCj 

\(j*  (v)  =\Jj*(u)  iff  v,  u  are  in  the  same  component  of  the  partition.  To  represent 

such  a  partitioning,  we  distinguish  one  node  of  each  component  of  the  partitioning 

to  be  the  value  source  of  all  other  nodes  of  that  block.  We  require  that  if 

v€V-V  (i.e.,  v  is  labeled  with  an  input  variable)  then  (v)  =  L(v)  iff  v 

is  a  value  source.  Let  V*  be  the  set  of  value  sources  and  let  VS  be  a 

mapping  from  nodes  in  V  to  their  value  sources.  Hence  the  fixed  points  of  VS 

are  the  value  sources  and  VS  ^[V*]  is  a  partitioning  of  V.  Note  that,  in 

general,  the  definition  of  "value  source"  is  not  uniquely  determined,  so  the 

definition  of  V*  and  VS  depends  on  our  particular  choice  of  value  sources. 

We  shall  find  value  sources  by  reducing  this  problem  to  the  problems  of 

P-graph  completion  and  decomposition  stated  below. 

Let  G  =  (V_ ,E_)  be  any  directed  graph  and  let  Scv,  be  a  set  of  vertices 
G  g  —  G 

of  G  such  that  for  each  vertex  V  6  VG  there  is  some  vertex  u£S  from  which 


v  is  reachable. 
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P-Graph  Completion  Problem.  Find 

S+  =  S  U {v  6  VG  |  there  are  at  least  two  paths  from  distinct  elements 
of  S  to  v  not  containing  any  other  element  of  s}. 

This  form  of  the  problem  is  due  to  Karr  [Ka]  ,  who  shows  that  it  is  equivalent 
to  the  original  formulation  due  to  Shapiro  and  Saint  [SS] .  (Actually,  this  form 
is  slightly  more  general  than  Karr's;  Karr  satisfies  our  restriction  on  S  by 
stipulating  that  there  is  a  single  r£s  from  which  every  v  €  VG  is  reachable.) 
Karr  provefe  that  for  each  v £ VG  there  is  one  and  only  one  element  of  S+  from 
which  v  is  reachable  (and  his  proof  extends  directly  to  our  slightly  more 
general  problem) . 

P-Graph  Decomposition  Problem.  Given  G  and  S+,  find,  for  each  v£VG, 
the  unique  u 6 S+  from  which  v  is  reachable. 

We  first  show  these  problems  can  be  solved  efficiently.  Shapiro  and  Saint 

give  an  0(|vg|z)  algorithm,  while  Karr  gives  a  more  complex  0( |vg| log|vG|  +  | EG j ) 

algorithm.  Here  we  reduce  these  problems  to  the  computation  of  a  certain  dominator 

tree,  for  which  there  is  an  almost  linear  time  algorithm  as  noted  in  Section  2.2. 

(This  construction  was  discovered  independently  by  Tarjan  [T2J.) 

Let  h  be  a  new  node  not  in  V  ,  and  let  G'  be  the  rooted  directed  graph 

G 

(VG  U  {h}  ,  EG  U  {  (h,v)  I  v  £  S>  -  {  (u,v)  |u  €  VG,  v£s},h) 

Thus  G'  is  derived  from  G  by  adding  a  new  root  h,  linking  h  to  every  node 
in  S,  and  removing  the  edges  of  G  which  lead  to  nodes  in  S.  Let  T  be  the 
dominator  tree  of  G'. 

LEMMA  2.4.  The  members  of  s+  are  the  sons  of  h  in  T. 

Proof.  IF.  Let  v£S+.  If  v£S  then  h  is  a  predecessor  of  v  in  G'  so 
h  is  the  father  of  v  in  T.  If  v € S+ -  S  then  by  definition  of  S+  there 
are  disjoint  paths  p^,  p^  in  G  from  distinct  elements  of  S  to 


v  not 


containing  any  other  element  of  S.  Clearly  p  and  p_  are  also  paths  in  G' 

since  they  contain  no  edge  entering  a  member  of  S.  Then  (h,p^)  and  (h,p2) 
are  paths  from  h  to  v  in  G'  which  have  only  their  endpoints  in  common,  so 
v  is  a  son  of  h  in  T. 

ONLY  IF.  Suppose  v  is  a  son  of  h  in  T.  If  h  is  a  predecessor  of 

v  in  G’  then  vEScs  .  Otherwise  there  are  in  G'  paths  (h,p^)  and  (h,p2> 

from  h  to  v  which  have  only  their  endpoints  in  common.  Moreover,  these  paths 
contain  no  element  of  S  except  for  the  first  nodes  of  p^,  p^ ,  since  no  edge  of 

G'  enters  an  element  of  S  except  from  h.  Hence  p^,  p2  are  disjoint  paths 

in  G'  from  distinct  members  of  S  to  v  not  containing  any  other  element  of 
S,  and  hence  v€S+.  o 

THEOREM  2.4.  For  each  vEv^,  the  unique  node  in  s+  from  which  v  is  reachable 
in  G  is  the  unique  node  which  is  a  son  of  h  and  an  ancestor  of 
v  in  T. 

Proof.  Let  w  be  that  ancestor  of  v  which  is  a  son  of  h  in  T.  By  Lemma 
2.4,  w€S+,  and  clearly  v  is  reachable  from  w  in  G  since  it  is  reachable  from 
w  in  T.  Conversely,  if  w  € S+  is  reachable  from  v  in  G  then  w  is  a  son 

of  h  in  T  by  Lemma  2.4,  and  w  must  be  an  ancestor  of  v  since  otherwise  v 

would  be  reachable  from  some  other  member  of  S+.  □ 

Now  we  establish  the  relation  of  these  problems  to  the  problem  of  finding 
V*  and  VS  as  stated  above.  Fix  some  V*  and  VS  by  choosing  one  node  of 
GVG  for  each  value  of  ^  on  V  consistent  with  our  definition  of  value  sources. 

For  each  rank  r,  let  Gr =  <vr»Er)  >  where  Vf  is  the  set  of  all  nodes  of  rank  r 

of  a  reduced  GVG  as  defined  in  Section  2.4  and  E^  is  the  edge  set  derived  from 
E  by 

(1)  deleting  all  edges  except  use-def  edges  between  nodes  of  rank  r, 

A 

(2)  for  those  remaining  use-def  edges  (v,u)  entering  u£Vr,  substituting 
instead  the  edge  (v,VS(u)), 
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(3)  finally  reversing  all  edges. 

Note  that  any  edge  of  GVG  departing  from  a  member  of  enters  a  node 

/v 

of  rank  r-1.  Let  S  be  the  set  of  all  value  sources  of  V  plus  all  nodes 

r  r 

of  rank  r  labeled  with  input  variables  which  have  a  departing  use-def  edge 

entering  a  node  of  rank  greater  than  r.  Note  that  for  each  node  v  of  G^, 

there  is  a  node  in  S  from  which  v  is  reachable  in  G  .  Finally,  let  S+ 

r  r  r 

be  defined  from  Sr  as  in  the  statement  of  the  P-graph  completion  problem. 

LEMMA  2.5.  The  members  of  S*  are  the  value  sources  of  rank  r. 

Proof.  IF.  Suppose  v£S+. 

CASE  1.  By  definition,  all  elements  of  {vs(v)|v£Vr}  are  value  sources. 

Hence  we  need  only  consider  the  case  where  v  is  a  node  of  rank  r  labled  with 

an  input  variable  which  has  a  departing  use-def  edge  (v,z)  entering  a  node  z 

of  rank  greater  than  r.  Since  v  is  of  rank  r,  v  must  also  have  a  departing 

use-def  edge  (v,u)  leading  to  a  node  of  rank  r.  By  Lemma  2.3,  iHz)  ^^(u)  ,  so 

by  the  definition  of  T  ,  \p(v)  =L(v)  and  v  is  a  value  source. 

GVG 

CASE  2.  Suppose  there  are  in  G^  disjoint  paths  (x^  ,x2 , . . .  ,x_. )  and 
(yi'y2" '  *  ,Yk)  Gr  ^ron'  distinct  Xi'yx^sr  to  v’  By  construction  of 

G^,  there  exist  distinct  x^y^GHtv)  such  that  VS(x^)  =  x^,  VS(y^)  =  y^,  and 
(x2,x1>  and  (y2,y1>  are  use-def  edges,  and  so  p^=  (v*x^,x^_^, —  jX^x^) 
and  p  =  (v=y  ,y  , —  ,y  ,y  )  are  disjoint  paths.  Now  suppose  v  is  not  a 
value  source.  Applying  Theorem  2.2,  there  is  a  value  source  u  (distinct  from 
v)  such  that  \p(v)  =  ^ (u)  =L(u).  Since  p1  and  p2  are  disjoint  they  cannot 
both  contain  u.  Suppose,  without  loss  of  generality,  that  p^  avoids  u. 

Then  all  maximal  use-def  paths  from  x1  contain  u.  Also,  by  definition  of 
Sr»  xi  =  xi  and  there  is  a  use-def  edge  (v,z)  such  that  z  is  not  of  rank  r. 
Since  any  maximal  use-def  path  from  z  must  contain  u,  rank(z)  =rank(u) 
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imp  lying  that  u  is  not  of  rank  r.  But,  by  hypothesis,  all  maximal  use-def 
paths  from  v  contain  u,  so  rank(v)  =rank(u).  This  implies  that  v  is  not 
of  rank  r,  contradicting  our  assumptions.  o 

By  Karr's  proof  [K]  of  the  uniqueness  of  the  P-graph  decomposition  of 

on  S  ,  we  have 
r 

theorem  2.5.  For  alt  nodes  v€v  of  rank  r  and  labeled  with  an  input  variable 
VS (v)  is  the  unique  value  source  contained  on  all  use-def  paths  in 
Gr  from  elements  of  to  v. 

Thus  the  problem  of  computing  VS  reduces  to  the  problem  of  decomposing 
the  reduced  global  value  graph  by  rank  and  then  constructing  dominator  trees. 

The  former  can  be  done  in  linear  time  by  Algorithm  B  of  Section  2.4,  the  latter 
in  almost  linear  time  by  [LT] . 

2.6  Our  Algorithm  for  Symbolic  Program  Analysis 

In  this  section  we  pull  together  the  various  pieces  developed  in  Sections 
2.1-5  to  give  a  unified  presentation  of  our  algorithm  computing  a  minimal  fixed 
point  case.  Instead  of  using  the  GVG  directly  to  represent  ip*,  as  suggested 
in  the  beginning  of  Section  2.5,  we  more  economically  represent  \p*  by  a  dag 
D*  derived  from  GVG  by  collapsing  nodes  into  their  value  sources;  more 
precisely  D* =  (V*,E*,L*)  where 

V*  =  {vs(v)|v€v}  =  the  set  of  value  sources, 

E*  =  {  (VS  (v)  ,VS  (u) )  |  (v,u)  £E  and  L(v)  is  a  function  symbol, 

L*  is  the  restriction  of  L  to  V*. 

Recall  from  Section  2.1  that  rooted  dags  may  be  used  to  represent  expressions  in 
EXP. 

LEMMA  2.6.  For  each  node  v€v,  (D*,VS(v))  represents  ip(v). 
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Proof.  Note  that  by  definition  of  VS,  for  each  v€v 
\p*  (VS  (v) )  =  ifj*  (v) 

for  each  v€V,  so  we  need  only  show  for  v  €  V* 

(D*,v)  represents  lp*(v). 

We  proceed  by  induction  on  a  topological  ordering  of  D*,  from  leaves  to  roots. 
Basis  Step.  If  v  is  a  leaf  of  D*,  then  (D*,v)  represents  the  constant 
symbol  L  (v)  =4>  (v)  . 

Induction  Step.  Suppose  v  is  in  the  interior  of  D*  and  (D*,u)  represents 
i^*(u)  for  all  children  u  of  v.  Thus  v  must  be  labeled  in  L  with  a 
function  symbol  0  and  have  immediate  successors  u^,...,^  in  GVG.  Then 

VS  (u^)  ,VS(Uk)  are  the  children  of  v  in  D*  and  for  i  =  l,...,k  by  the 

induction  hypothesis  (D*,VS(u^))  represents  (VS  (uj  )  =  \p  (uj  .  Thus  (D*  ,v) 
represents  (0  l^*(u^) . .  .ip*  (u^) )  =  ip*  (v)  by  definition  of  D 

Our  algorithm  is  given  below.  As  in  Section  2.4,  we  compute  tp*  and  VS 
in  the  order  of  the  rank  of  nodes  in  V.  The  array  COLOR  is  used  to  discover 
nodes  with  the  same  iJj*. 

ALGORITHM  C 


INPUT  GVG  =  (V,E,L) 

OUTPUT  VS  and  D*  =  (V* ,E* ,L*) . 


begin 

initialize : 

declare  VS,  COLOR  :=  arrays  of  length 
procedure  COLLAPSE (S ,u) : 
for  all  v  €  S  do 
begin 

VS  (v)  :=u; 
if  u  ^  v  then 
begin 

for  each  edge  (w,v)  enter 
substitute  (w,v) ; 
for  each  edge  (v,w)  depar 
substitute  (u,w); 


entering  v  do 
departing  from  v  do 


delete 

end; 


from  the  edge  set; 


end; 


Compute  new  labeling  L'  of  V  by  Algorithm  A 
and  reduce  GVG  as  described  in  Section  2.2; 

Compute  rank  of  nodes  in  V  by  Algorithm  B  of  Section  2.3; 
for  r  :=  0  to  {MAX  rank (v) J  v  €  v}  do 
begin 

Let  Vr,  Vr  be  the  nodes  in  V,  V  of  rank  r; 
for  all  v  €  Vr  do 

if  r  =  0  then  COLOR(v)  :=L'(v) 
else  COLOR(v)  :=  <L(v)  ,u^ , . . .  ,ujt>  where 
u^,...,^  are  the  current  immediate  successors  of  v; 
radix  sort  nodes  in  Vr  !by  their  COLOR; 

for  each  maximal  set  Scvr  containing  nodes  with  the  same  COLOR  do 
begin 

choose  some  u£S; 

comment  u  is  made  a  value  source ; 

COLLAPSE (S,u) ; 
end; 

Let  h  be  some  node  not  in  Vr: 

Er  :=  Sr  the  empty  set  {}; 

for  all  v £  Vr  do  add  VS{v)  to  S  ; 

for  all  v  €  Vr-Vr  do 

for  each  node  u  which  is  currently  an  immediate  successor  of  v  do 
if  u  is  of  rank  r  then  add  (u,v)  to  Er; 
else  add  u  to  Sr; 

Let  Tr  be  the  dominator  tree  of  Gr =  (Vr U {h} ,Er U { (h,v) |v £ Sr} ,h) ; 
for  all  sons  u  of  h  in  Tr  do 
begin 

comment  by  Theorem  2.4  and  Lemma  2.5,  u  is  a  value  source; 

COLLAPSE ({the  descendants  of  u  in  Tr},u); 
delete  all  edges  departing  from  u; 
end; 

end; 

Let  V*,  E*  be  the  node  set  and  edge  list  derived  from  V,E  by  the 
above  collapses; 

for  all  v  £  V*  do  L*  (v)  :=  L'  (v)  ; 
end. 


theorem  2.6.  Algorithm  C  is  correct  and  can  he  implemented  in  almost  linear 
time. 

Proof .  The  correctness  of  Algorithm  C  follows  directly  from  Theorems  2.4, 

2 . 5  and  Lemmas  2.5,  2.6. 

In  addition,  we  must  show  that  Algorithm  C  can  be  implemented  in  almost 
linear  time.  The  storage  cost  of  GVG  is  linear  in  |v|  +  |e|.  The  initializa 
tion  of  Algorithm  C  costs  time  linear  in  j N |  +  | A | .  Algorithms  A  and  B  cost 


linear  time  by  Theorems  2.1  and  2.3,  respectively.  The  time  cost  of  the  r'-th 
execution  of  the  main  loop,  exclusive  of  the  computation  of  T  ,  is  linear  in 
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|v  |  +  |Er|,  plus  the  sum  of  the  outdegree  of  all  v€V  -V^.  (Here  we  assume 
that  elements  in  the  range  of  L'  are  representable  in  a  fixed  number  of 
machine  words  and  that  the  number  of  argument-places  of  function  signs  is 
bounded  by  a  fixed  constant,  so  a  radix  sort  can  be  used  to  partition  by 

COLOR. )  The  computation  of  the  dominator  tree  T  requires  by  [LT]  time  cost 
almost  linear  in  | V^. |  +  | |  .  Thus,  the  total  time  cost  is  almost  linear  in 
I v  1  +  |  E  J  .  D 

This  completes  the  presentation  of  our  algorithm  for  computing  a  minimal 
fixed  point  case  x p*. 

3.  FURTHER  WORK 

3. 1  Improving  the  Efficiency  of  Our  Algorithm  for  Symbolic  Program  Analysis 

The  primary  goal  of  this  paper  was  to  construct  the  minimal  fixed  point 
xp*  of  the  functional  4'.  Actually,  f  was  defined  relative  to  a  program  JI ' 
derived  from  the  original  program  II  by  adding  dummy  assignments  of  the  form 
X  :=  X  at  every  block  where  some  program  variable  X  €  E  is  not  assigned.  This 
does  not  change  the  semantics  of  the  program  but  requires  the  addition  of 
0(|E| |n|)  text  expressions  whose  covers  we  are  not  actually  concerned  with. 

In  practice  we  need  the  covers  given  by  \p*  only  over  the  domain  of  the  original 
text  expressions  of  II. 

The  algorithms  of  Section  2  allow  us  to  construct,  for  any  global  value 
graph  GVG,  the  unique  minimal  element  of  in  space  linear  in  the  size  of 

GVG  and  time  almost  linear  in  the  size  of  GVG.  Section  2.1  defines  the 
standard  global  value  graph  GVGq  which  has  size  0(|E||a|  + Z)  and  with  the 
property  that  is  the  minimal  element  of  r_„_  .  We  describe  here  how  we  may 

gvg0 

construct  a  global  value  graph  GVG1  of  size  0(d|A|  +11)  where  d  is  a  parameter 


of  the  program  which  is  often  of  order  1  for  block-structured  programs  but  may  grow 
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to  |l|.  The  construction  of  GVG^  can  be  done  by  a  preprocessing  stage 

of  IRT]  costing  a  number  of  bit  vector  steps  almost  linear  in  |a|  +  i. 

Thus  this  preprocessing  stage  offers  no  theoretical  advantage  but  in  practice 

may  often  lead  to  a  global  value  graph  of  size  linear  in  the  program  and  flow 

graph.  The  construction  of  GVG+  can  be  done  by  a  preprocessing  stage  of 

[RT]  costing  a  number  of  bit  vector  steps  almost  linear  in  |a|  +  JL  Appendix 

111  shows  GVG  has  the  property  that  the  minimal  element  of  T  is  the 

1  GVG1 

minimal  fixed  point  of  the  functional  defined  in  Section  2.4.  In  contrast 

to  the  iterative  method,  which  for  a  large  class  of  programs  has  storage  cost 

(£.  |  M  [ )  and  time  cost  fi(£jNj2),  our  direct  method  has  storage  cost  linear  in 

the  size  of  GVG^  and  time  cost  almost  linear  in  the  size  of  GVG^. 

A  path  is  m -avoiding  if  the  path  does  not  contain  node  m.  Consider  blocks 

m,  n  in  the  control  flow  graph  such  that  m  dominates  n.  A  program  variable 

x£E  is  definition- free  between  m  and  n,  if  (1)  m  =  n  or  (2)  m  properly 

dominates  n  and  X  is  not  assigned  to  on  any  m-avoiding  control  path  from  an 

immediate  successor  of  m  to  an  immediate  predecessor  of  n  (otherwise  X  is 

defined  between  m  and  n) .  We  define  a  function  w  from  text  expressions 

which  are  input  variables  to  blocks  of  the  control  flow  graph.  For  each  input 

variable  Xn,  W(xn)  =m,  where  m  is  the  first  block  on  the  dominator  chain  of 

the  control  flow  graph  from  the  start  block  s  to  n  such  that  X  is 

definition-free  between  m  and  n.  An  algorithm  in  [RT]  computes  W  in  a 

number  of  bit  vector  steps  almost  linear  in  |n|  +  it. 

It  will  be  convenient  to  assume  that  for  each  text  expression  which  is  an 

input  variable  Xn  such  that  W(Xn)  =n,  X  is  assigned  to  at  each  block  m 

immediately  preceding  n.  We  must  add  0 ( d | N | >  dummy  assignments  to  accomplish 

this;  d  is  often  constant  for  block  structured  programs  but  may  grow  to  |l|. 

Let  GVGq =  (V,E,L)  be  the  standard  global  value  graph  defined  in  Section  2.1. 

2 

Let  be  the  set  of  pairs  of  vertices  (u,v)  £V  such  that 


(1)  v  is  labeled  with  an  input  variable  X 

(2)  t'  represents  an  output  expression  <?(X,m) 

(3)  either  (a)  W(Xn)  =n  and  m  is  an  immediate  predecessor  of  n  in 
F,  or  (b)  W(Xn)  =  m  properly  dominates  n. 

Note  that  contains  0 (ci | A j  +  £)  edges.  Let  E^  be  the  use-def 

edges  of  GVGq.  Let  GVG^  be  the  global  value  graph  with  vertices  V, 
labeling  L,  and  use-def  edges  E  b  E^  -  E^.  Let  d=  |e^|/|a|  and  observe 
that  d<|l|.  Then  (e^)  =0  (d|  a|  )  and  so  GVG^  is  of  size  0(|e  |  +  £)  = 

0(d| a|  +  £) . 

Appendix  III  proves  has  a  minimal  fixed  point  which  contains  in  its 

domain  the  minimal  fixed  point  cover  4'*.  Thus  our  algorithm  given  in  Section  2 
can  be  used  to  construct  41*  in  time  almost  linear  in  the  size  of  GVG^. 

3.2  Improved  Covers  for  Restricted  Domains 

We  show  in  Appendix  I  that  there  is  no  finite  algorithm  for  computing 
minimal  covers  in  the  arithmetic  domains.  However,  the  minimal  fixed  point 
covers  computed  by  our  algorithm  in  Section  2  can  be  improved  by  use  of  domain- 
specific  identities. 

In  [FI]  our  methods  for  computing  covers  are  extended  to  programs  which 
operate  on  records  in  a  language  such  as  PASCAL  or  LISP  1.0.  There  we  use  the 
domain  specific  fact  that  selections  on  (such  as  car  or  cdr  is  LISP)  yield 
subcomponents  for  which  we  can  derive  covering  expressions. 
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APPENDIX  I 


Graph  Theoretic  Notions 

A  digraph  G=  (V,E)  consists  of  a  set  V  of  elements  called  nodes 
and  a  set  E  of  ordered  pairs  of  nodes  called  edges.  The  edge  (u,v)  departs 
from  u  and  enters  v.  we  say  u  is  an  immediate  predecessor  of  v  and  v 
is  an  immediate  successor  of  u.  The  outdegree  of  a  node  v  is  the  number  of 
immediate  successors  of  v  and  the  indegree  is  the  number  of  immediate  pre¬ 
decessors  of  v. 

A  path  from  u  to  w  in  G  is  a  sequence  of  nodes  p=  (u=v  ,v2  , . . .  ,v^ 
where  (v^,v^+^)  f°r  l^i<k.  The  length  of  the  path  p  is  k-1. 

The  path  p  may  be  built  by  composing  subpaths: 

P=  (v^  . . .  ,vj  •  (v^  . . .  ,vk>  . 

The  path  p  is  a  cycle  if  u  =  w.  A  strongly  connected  component  of 
G  is  a  maximal  set  of  nodes  such  that  each  pair  in  the  set  are  contained  in  a 
common  cycle.  J  ' 

A  node  u  is  reachable  from  a  node  v  if  either  u  =  v  or  there  is  a 
path  from  u  to  v. 

We  shall  require  various  sorts  of  special  digraphs.  A  rooted  digraph 
(V,E,r)  is  a  triple  such  that  (V,E)  is  a  digraph  and  r  is  a  distinguished 
node  in  V,  the  root.  A  flow  graph  is  a  rooted  digraph  such  that  the  root  r 
has  no  predecessors  and  every  node  is  reachable  from  r.  A  digraph  is  labeled 
if  it  is  augmented  with  a  mapping  whose  domain  is  the  vertex  set.  An  oriented 
digraph  is  a  digraph  augmented  with  an  ordering  of  the  edges  departing  from  each 
node.  We  shall  allow  any  given  edge  of  an  oriented  graph  to  appear  more  than 


once  in  the  edge  list. 
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A  digraph  G  is  acyclic  if  G  contains  no  cycles,  cyclic  otherwise. 

Let  G  be  acyclic.  If  u  is  reachable  from  v,  u  is  a  descendant  of  v 
and  v  is  an  ancestor  of  u  (these  relations  are  proper  if  u/v).  Nodes 
with  no  proper  ancestors  are  called  roots  and  nodes  with  no  proper  descendants 
are  leaves.  Immediate  successors  are  called  sons.  Any  total  ordering  con¬ 
sistent  with  either  the  descendant  or  the  ancestor  relation  is  a  topological 
ordering  of  G. 

A  flow  graph  T  is  a  tree  if  every  node  v  other  than  the  root  has  a 
unique  immediate  predecessor,  the  father  of  v.  A  topological  ordering  of  a 
tree  is  a  preordering  if  it  proceeds  from  the  root  to  the  leaves  and  is  a 
postordering  if  it  begins  at  the  leaves  and  ends  at  the  root.  A  spanning  tree 
of  a  rooted  digraph  G=  (V,E,r)  is  a  tree  with  node  set  V,  an  edge  set 
contained  in  E,  and  a  root  r. 

Let  G  =  (V,E,r)  be  a  flow  graph.  A  node  u  dominates  a  node  v  if 
every  path  from  the  root  to  v  includes  u  (u  properly  dominates  v  if  in 
addition,  u^v).  It  is  easily  shown  that  there  is  a  unique  tree  T  ,  called 

U 

the  dominator  tree  of  G,  such  that  u  dominates  v  in  G  iff  u  is  an 
ancestor  of  v  in  T  .  The  father  of  a  node  in  the  dominator  tree  is  the 
immediate  dominator  of  that  node. 

All  of  the  above  properties  of  digraphs  may  be  computed  very  efficiently. 

An  algorithm  has  linear  time  cost  if  the  algorithm  runs  in  time  0(n)  on  input 

•  » 

of  length  n  and  has  almost  linear  time  cost  if  the  algorithm  runs  in  time 
0(na(n,n))  where  a  is  the  extremely  slow  growing  function  of  [T3]  (a  is 
related  to  a  functional  inverse  of  Ackermann's  function).  Using  adjacency  lists, 
a  digraph  G=  (V,E)  may  be  represented  in  space  0 ( |  V j  +  |e|).  Knuth  [Knl] 
gives  a  linear  time  algorithm  for  computing  a  topological  ordering  of  an 


acyclic  digraph.  Lengauer  and  Tarjan  ILT]  present  linear  time  algorithms  for 
computing  the  strongly  connected  components  of  a  digraph  and  a  spanning  tree 
and  an  almost  linear  time;  algorithm  for  computing  the  dominator  tree  of  a  flow 


graph . 


APPENDIX  II 


Unsolvability  of  Various  Code  Improvements 

The  introduction  listed  a  number  of  code  improvements  which  are  related 
to  the  problem  of  determining  minimal  covers  of  text  expressions.  Here  we  show 
that  even  constant  propagation,  the  most  fundamental  of  these  improvements,  is 
recursively  unsolvable  for  programs  evaluated  within  the  arithmetic  domain. 

This  rules  out  the  possibility  of  finding  minimal  covers  even  in  simple  domains. 
Previously,  Kam  and  Ullman  [KU2]  have  shown  related  global  flow  problems  to  be 
insolvable  in  an  abstract,  nonarithmetic  domain. 


THEOREM  Al.  In  the  arithmetic  domain 3  it  is  an  undecidable  problem  to  discover 
if  a  text  expression  is  covered  by  a  constant  symbol. 


Proof.  The  method  of  proof  will  be  to  reduce  this  problem  to  that  of  the  dis¬ 
covery  of  text  expressions  covered  by  consant  symbols  within  the  arithmetic 


domain  (Z,I  ) 
z 


Let  {XQ ,X^,X2 , . . . ,X^}  a  set  variables,  where  k  > 5.  Matijasevic  [M] 
has  shown  that  the  problem  of  determining  if  a  polynomial  Q (X^,X2 , . . . ,X^)  has 
a  root  in  the  natural  numbers  (Hilbert's  ]0th  problem)  is  recursively  unsolvable. 


Consider  the  flow  graph  of  Figure  A2 .  Let  t  be  the  text  expression 


iX^f/ (1+Q (X^f , . . . ,X^f )  )j  located  at  block  f.  We  show  t  is  covered  by  a 
constant  symbol  iff  Q  has  no  root  in  the  natural  numbers. 

For  any  control  path  p  from  the  start  block  s  to  the  final  block  f 


and  for  i  =  0,l,  —  ,k  let  X^(p)  =  I  (VALUE  (X^f ,p) )  =  the  value  of  X^  just  on 


entry  to  f  relative  to  p.  Also,  let  X(p)  *  (x  (p) ,. . . ,X^(p) ) .  Observe  that 
for  any  k-tuple  of  natural  numbers  z ,  there  is  a  control  path  p  from  s  to 


f  such  that  z  =  X(p). 


IF.  Suppose  Q  has  no  root  in  the  natural  numbers.  Then  for  each 
control  path  p  from  s  to  f,  Q  (X^  (p)  , . . .  ,X^  (p) )  ^0,  so  VALUE(t,p)  =0. 

Thus,  t  is  covered  by  the  constant  0. 

ONLY  IF.  Suppose  Q  has  a  root  z  in  the  natural  numbers.  Then  it  is 
possible  to  find  execution  paths  p  and  q  from  s  to  f  such  that  z  =  X(p) 
and  X (p)  =  0.  Hence  VALUE(t,p)=0  and  VALUE (t,q)  =  1,  so  t  is  not 
covered  by  a  constant  symbol.  D 

r 

corollary  Al.  In  the  arithmetic  domain,  the  following  global  flow  problems  are 
unsolvable:  discovery  of  minimal  covers,  birth  and  safe  points  of  code  motion, 
redundant  text  expressions,  and  loop  invariants. 

Proof.  It  is  easy  to  show  that  the  problem  of  discovery  of  constant  text 
expressions  reduces  to  each  of  these  problems.  Add  the  edge  (f,n^)  to  the 
control  flow  graph  F  of  Figure  4,  so  t  is  contained  as  a  cycle  of  F.  Then 
by  Theorem  4,  Q  has  no  root  in  the  natural  numbers  iff  t  is  covered  by  0 
iff  s  is  the  birth  point  of  t 
iff  s  is  the  safe  point  of  t 
iff  t  is  redundant  on  entry  to  f 
iff  t  is  a  constant  loop  invariant. 

Thus,  the  problem  of  discovery  of  whether  text  expression  t  is  covered  by  a 
constant  reduces  to  each  of  the  above  global  flow  problems.  (Note  that  the 
problem  of  safety  of  code  motion  is  also  hard  for  other  reasons;  if  we  add  the 
text  expression  t'  =  tl/Q(x^f , . . . ,X^f) j  to  block  f  then  Q  has  no  root  in 
the  natural  numbers  iff  t'  is  safe  at  f.)  0 


APPENDIX  III 


Fixed  Points  of  T _ 

-  GVG 

2 

We  define  a  partial  mapping  min:  EXP  -*-EXP  such  that  for  all  S,S'  €  EXP , 
S  min  S’  =  S  if  origin  (<f)  properly  dominates  origin  (<?') 

-S'  if  origin  (<£')  properly  dominates  origin  IS) 
or  if  origin  (S)  -  origin  (S' )  and 

(i)  if  S  =  S'  then  S  min  S'  =  S  =  S'  ,  or 
(ii)  if  S  is  a  constant  symbol  and  S'  is  a  function  application, 
then  S  min  S'  =  S'  min  S  =  S,  or 

(iii)  if  S,S'  are  function  applications  (0  S^-  -  *Sy)  ,  (0<?j...<££) 
respectively,  and  S  ^  ~  <£  min  <g|  is  defined  for  i*l,...,k 
then  S  min  S'  *  (QS^...S^),  and  otherwise,  S  min  S'  is  undefined. 

We  extend  min  to  the  partial  mapping  from  pairs  of  elements  of  r_,_ 
to  defined  thus:  for  <J»,  \p’  €  >  if  for  all  v€V, 

i^(v)  min  (v)  *ij)(v)  is  defined  then  ip  min  4*  *  =  ij;  and  otherwise  4/  min 
is  undefined. 

Let  GVG  be  as  an  arbitrary  global  value  graph.  We  show  that  TGVG  is 
a  semilattice.  We  require  two  technical  lemmas: 

lemma  Al.  For  any  v€v  labeled  with  an  input  variable  and  any  control  path 
p  from  the  start  block  s  to  loc(v),  there  is  a  maximal  use-def  path  q 
from  v  such  that  all  th ^  nodes  in  q  have  distinct  loc  values  in  p. 

Proof.  We  consider  (t)  to  be  a  trivial  use-def  path.  Suppose  we  have 
constructed  a  use-def  path  (v =  u^, . . . ,uj  such  that  loc(u^),  loc(u^  ^),..., 
loc(u^)  are  distinct  blocks  occurring  in  this  order  in  p.  If  u^  is  not 
labeled  with  an  input  variable  (and  thus  has  no  departing  value  edges)  then 


(t = u^, • . . ,vl)  is  a  maximal  use-def  path.  Otherwise,  let  be  the  subpath 

of  p  from  s  to  the  first  occurrence  of  block  loc(u.)  and  let  (u. ,u.  ,) 

1  11+I 

be  a  use-def  edge  such  that  loc(u.  .)  occurs  strictly  before  loc(u.)  in 

l+l  l 

p.  Then  (t = u^, . . . ,u^ ,u^+^)  is  a  use-def  path  and  loc(u^+1)  is  distinct 
from  blocks  loc(u^) , . . . ,loc(uJ .  The  result  thus  follows  from  induction  on 
the  length  of  p.  o 

LEMMA  A2.  ,For  any  \p  6  ccnd  v€v,  origin  (ip(v))  dominates  loc(v). 

Proof  by  contradiction.  Suppose  for  some  v€V,  origin  (vp ( v) )  does  not 
dominate  loc(v). 

Hence,  there  must  be  an  input  variable  Xn  occurring  in  4>(v)  such  that  n 

does  not  dominate  loc(v) ,  and  so  there  is  an  n-avoiding  path  p  from  the 

start  block  s  to  loc(v).  Also,  there  must  exist  some  u£v  labeled  with 

an  input  variable  and  also  located  at  block  n,  such  that  By  Lemma 

Al,  we  can  construct  a  maximal  use-def  path  (u = u^, . . . ,u^)  such  that 

loc(u^) . ,loc(u  )  are  distinct  blocks  in  p.  Let  j  be  the  maximal  integer 

<k  such  that  \p  (u,  )  =  •••  =  ^  (u .) .  If  L(u.)  is  an  input  variable,  then 
1  J  3 

\p( u^)  =L(u.)  =  xn,  so  loc(u^)  =n  is  contained  in  p,  contradicting  the 
assumption  that  p  contains  n.  Otherwise,  if  L(uJ  is  not  an  input 
variable  then  neither  is  \p(v)  =  ^ (u^ ) ,  a  contradiction  with  the  assumption  that 
iHu)  =  Xn.  a 


THEOREM  A2.  IV, _  is  a  semi  lattice. 

GVG 

Proof.  It  is  sufficient  to  show  min  is  well  defined  over  T _ _  We  proceed 

" "  -  GVG 

by  induction.  Suppose  for  ip,  \p'  €  r_,,_  and  some  $  in  the  domain  of  ¥, 

GVG 

i^(u)  min  ifr'  (u)  is  defined  for  all  u€v  such  that  iHu)  is  a  proper  subex¬ 


pression  of  <?.  Consider  some  text  expression  v  such  that  ip(v)  =  <f.  By 


Lemma  2.2,  both  origin  (if)  (v) )  and  origin  (if)'  (v) )  are  contained  on  all  control 
paths  from  the  start  block  s  to  loc(v) ,  so  we  may  assume  without  loss  of 
generality  that  origin (if)(v))  dominates  origin (ij)' (v) ) .  Observe  that 
i|>(v)  min  if)'  (v)  =  if)(v)  if  origin  (if)  (v) )  properly  dominates  origin  (if)'  (v) )  so 
we  further  assume  that  origin  (if)  (v) )  =  origin  (if)  *  (v) ) . 

CASE  1.  If  L(v)  is  a  constant  symbol  c  then  if)(v)  =  if)'(v)  =c  so 
if)  (v)  =  min  if)'  (v)  =  c. 

CASE  2.  Suppose  L(v)  is  a  function  symbol  0  and  v  has  immediate  successors 

u  ,...,vl.  By  the  induction  hypothesis  <£!  =  if)(u.)  min  if)’(u.)  is  defined  for 
±  x  i  i  "  i 

i  =  l,...,k.  Hence  if)(v)  min  4/' (v)  is  the  reduced  expression  derived  from 

<6  •  .<S£)  • 

CASE  3.  Otherwise,  suppose  L(v)  is  an  input  variable.  Let  p  be  a  control 

path  from  the  start  block  s  to  loc(v) .  By  Lemma  2.1,  we  can  construct  a 

maximal  use-def  path  (v  *  u^,. . .  ,u^)  such  that  for  i  =  l,...,k  each  locfuj 

is  contained  in  p.  Let  j  be  the  maximal  integer  such  that  iMu^)  =  •  •  •  «4;(u  ). 

CASE  3a.  If  if)'  (v)  =  ififu.^)  =  •  •  •  =  if)  (uj  it'll'  ^ui+1)  for  some  i,  l<i<j,  then 

by  the  definition  of  r_,,_,  if)(v)  =  if)'(u.)  =L(u.).  Hence  origin  (if) '  (v) )  =n.  ^ 

u*«  11  1 

n^  =  origin  (if)  (v) )  ,  contradicting  our  assumption  that  origin  (if) '  (v)  )=  origin  (if)  (v) ) 

CASE  3b.  Otherwise,  suppose  if)' (v)  =if)'(u^)  =  =  iJ)'(uJ  so  we  have  if)(v)  =if)(u_.) 

and  if)'  (v)  =  if)'  (Uj ) .  Applying  Cases  1  and  2  ,  lf)(v)  min  if) '  (v)  =  if)  (u^ )  min  if)'  (u^ ) 

is  defined  if  L(u^)  is  either  a  constant  symbol  or  function  symbol,  so  we 

assume  L(u_.)  is  an  input  variable.  Since  j  is  maximal,  if)(v)  =  if)(uj  =L(uJ. 

If  lf)'  (v)  *  if/'  (u_. )  *  L(u  J  then  if) (v)  min  if)'  (v)  =  L(u,) .  Otherwise,  suppose 

if)'  (u.)  /L(u.).  For  each  use-def  edge  (u.,v'),  by  the  definition  of  r_„_, 
33  3  GVG 

if)' (u  J  *  if)' (v' )  and  by  Lemma  2.2,  origin  (if)' (v1 ) )  dominates  loc(v').  Hence 
origin  (if)'  (v)  )=  origin  (if)’  (u,) )  is  distinct  from  origin  (if)  (v) )  ,  contradicting 
our  assumption  that  origin  (if) '  (v) }  ■  origin  (if)  (v) ) .  □ 


Theorem  A2  immediately  implies  that: 


COROLLARY  A2.  has  an  unique  minimal  element  min  rGVG. 

Let  GVGq  be  the  standard  global  value  graph  defined  in  Section  2.1.  We 
have  shown  that  r_„  is  a  finite  semi lattice  and  hence  has  a  minimal 
element.  We  now  show  that  this  minimal  element  is  the  unique  minimal  fixed 
point  of  ¥. 


theorem  A3.  \p* ,  the  minimal  fixed  point  of  *i is  identical  to  the  unique 

minimal  element  of  . 

GVG0 

Proof.  Observe  that  any  fixed  point  of  ¥  is  an  element  of  FGV(J  .  By 

0 

Corollary  2.1,  r  has  a  unique  minimal  element  i|/=min  T .  Suppose  ip 

CjVvj  -  bvu 

0  o 

is  not  a  fixed  point  of  ¥.  Observe  that  since  $  € r_„_  ,  for  each  input 

0 

variable  X°,  if  iMx")  /  X11  then  ¥(ty)(xn)  =$(xn).  Hence  there  is  an  input 
variable  xn  such  that  $(xn)*xn  but  ¥  ($)  (Xn)  =  <?  where  8=8(4>(X,m) ) 


for  all  blocks  m  immediately  preceding  block  n  in  the  control  flow  graph  F. 

/v 

We  are  going  to  construct  a  mapping  ¥  €  T  distinct  from  ¥  such 

GVCa 

that  This  will  contradict  our  assumption  that  \p  is  the  minimal  ele- 

A 

ment  of  r_„_  .  For  each  text  expression  t,  let  ty(t)  be  derived  from  iMt) 
GVG0 

by  substituting  8  for  each  occurrence  of  X°,  and  then  reducing  the  resulting 


expression.  We  now  show  ij>  £  r_,_  .  Consider  any  input  variable  Y 

v»VG_ 

0  .  _ .  . 


„n 


n\  ..n1 


CASE  a.  Suppose  iMY  )=Y  .  If  Y  ¥  X  then  ip  (Y  )*Y  .  Otherwise,  if 


Y  =  (X,n)  then  for  each  block  m  immediately  preceding  block  n’  =n, 
\MYn  )  ®  \H<?(Y,m))  =8,  and  since  Xn  is  not  contained  in  8,  ^(Yn  )  = 


lf/(£(Y,m))  - 8 . 

CASE  b.  If  iMYn  )  ¥ Y°  then  for  each  block  m  immediately  preceding  n’  in 
F,  $<Yn’)  =4/(8(Y,m))  so  \p(Yn>)  -<J>(tf(Y,m>).  Thus  *  €  1*^.  . 
m  immediately  preceding  n  in  F,  ^*i|»(Xn)  «i|M^(X,m))  so 


For  each  block 


origin  (^(X11)  =  origin  (<M<?(X»m) ) ) 

dominates  loc (<?(X,m) )  ,  by  Lemma  A2 
=  m, 

and  hence  origin  (4*  (X11) )  properly  dominates  origin  (ty  (X11) ) .  This  implies  that 

$  is  not  the  minimal  element  of  r„,_  ,  a  contradiction.  D 

GVG0 

Let  GVG^  be  the  global  value  graph  defined  in  Section  3.1.  Let  be 

the  minimal  fixed  point  of  .  By  Theorem  A3,  is  the  minimal  fixed 

point  of  .  As  in  Section  3.1,  we  assume  that  for  each  text  expression 

gvg0 

which  is  input  variable  Xn  such  that  W(Xn),  then  X  is  assigned  to  at  each 
block  immediately  preceding  n.  Thus  and  have  the  same  domain. 

THEOREM  A4 .  \p+  *  ip  * . 

Proof.  Clearly  €  T _  .  Suppose,  however,  that  Then  since  \l >* 

GVGq 

is  the  unique  minimal  fixed  point  of  r_,_  ,  there  is  some  v  such  that 

GVG0 

origin  (v) )  properly  dominates  origin  ( ip  (v) ) .  Choose  v  so  that  ij;  (v) 
has  minimal  rank  and  origin  (ip+ (v) )  is  also  minimal  with  respect  to 
domination  ordering.  Now  v  is  certainly  not  a  constant.  If  v  is  of 
the  form  (u^,...,^)  then  i|»Mu^)  s*  ^(u^)  for  some  i,  such  that 
rank(u^)  <  rank(v)  a  contradiction  with  the  assumption  that  v  has 

minimal  rank.  Otherwise,  suppose  v  is  an  input  variable  Xn.  Since 
origin  (4)+ (v) )  is  also  minimal,  we  can  assume  that  (v)  *  v.  Then  X  cannot 
be  definition-free  from  origin  (i|>*  (v) )  to  n,  and  there  must  be  use-def  edges 
(v,u^)  ,  (v,u2)  such  that  i{>+(u^)  /ip+(u2).  But  this  implies  also  that 


i^*(v)  *v,  a  contradiction. 


c 


